NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering

Wang, Haoyu; Li, Ruirui; Jiang, Haoming; Tian, Jinjin; Wang, Zhengyang; Luo, Chen; Tang, Xianfeng; Cheng, Monica Xiao; Zhao, Tuo; Gao, Jing (November 2024, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing)

Full Text Available
BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering

https://doi.org/10.18653/v1/2024.emnlp-main.58

Wang, Haoyu; Li, Ruirui; Jiang, Haoming; Tian, Jinjin; Wang, Zhengyang; Luo, Chen; Tang, Xianfeng; Cheng, Monica Xiao; Zhao, Tuo; Gao, Jing (November 2024, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing)

Full Text Available
From local to global gene co-expression estimation using single-cell RNA-seq data

https://doi.org/10.1093/biomtc/ujae001

Tian, Jinjin; Lei, Jing; Roeder, Kathryn (March 2024, Biometrics)

ABSTRACT In genomics studies, the investigation of gene relationships often brings important biological insights. Currently, the large heterogeneous datasets impose new challenges for statisticians because gene relationships are often local. They change from one sample point to another, may only exist in a subset of the sample, and can be nonlinear or even nonmonotone. Most previous dependence measures do not specifically target local dependence relationships, and the ones that do are computationally costly. In this paper, we explore a state-of-the-art network estimation technique that characterizes gene relationships at the single cell level, under the name of cell-specific gene networks. We first show that averaging the cell-specific gene relationship over a population gives a novel univariate dependence measure, the averaged Local Density Gap (aLDG), that accumulates local dependence and can detect any nonlinear, nonmonotone relationship. Together with a consistent nonparametric estimator, we establish its robustness on both the population and empirical levels. Then, we show that averaging the cell-specific gene relationship over mini-batches determined by some external structure information (eg, spatial or temporal factor) better highlights meaningful local structure change points. We explore the application of aLDG and its minibatch variant in many scenarios, including pairwise gene relationship estimation, bifurcating point detection in cell trajectory, and spatial transcriptomics structure visualization. Both simulations and real data analysis show that aLDG outperforms existing ones.
more » « less
Online control of the familywise error rate

https://doi.org/10.1177/0962280220983381

Tian, Jinjin; Ramdas, Aaditya (April 2021, Statistical Methods in Medical Research)
null (Ed.)
Biological research often involves testing a growing number of null hypotheses as new data are accumulated over time. We study the problem of online control of the familywise error rate, that is testing an a priori unbounded sequence of hypotheses ( p-values) one by one over time without knowing the future, such that with high probability there are no false discoveries in the entire sequence. This paper unifies algorithmic concepts developed for offline (single batch) familywise error rate control and online false discovery rate control to develop novel online familywise error rate control methods. Though many offline familywise error rate methods (e.g., Bonferroni, fallback procedures and Sidak’s method) can trivially be extended to the online setting, our main contribution is the design of new, powerful, adaptive online algorithms that control the familywise error rate when the p-values are independent or locally dependent in time. Our numerical experiments demonstrate substantial gains in power, that are also formally proved in an idealized Gaussian sequence model. A promising application to the International Mouse Phenotyping Consortium is described.
more » « less
Full Text Available

Search for: All records